ESSNet on Statistical Disclosure Control

Task 8. Analysis of problems on linked tables.

When tables are linked through simple linear constraints, secondary cell suppressions must obviously be coordinated between those tables. Otherwise it might for instance happen that the same cell is suppressed in one table because it is used as secondary suppression, while within another table it remains unsuppressed. Basically, there are two alternative options to prevent this: the first option is to include all the linking constraints into the formulation of one single secondary cell suppression problem. The alternative is to break this big, original cell suppression problem into a set of sub-problems, e.g. to select secondary suppressions in a set of individual tables (or sub-tables, respectively). While processing those individual (sub-) tables separately, we note any secondary suppression belonging also to one of the other (sub-) tables, suppress it in the other (sub-) table as well, and carry out (if necessary: repeat) the cell suppression procedure for this other (sub-) table. This approach is called a ‘backtracking procedure’. Although within a backtracking process the cell-suppression procedure will usually be repeated several times for each (sub-) table, the number of computations required for the process is normally much smaller compared to the processing of a single big problem.
The software τ-ARGUS offers a choice between four different approaches for the selection of secondary suppressions. Of those four approaches, based on the results of extensive testing with large, hierarchical, mostly 2-dimensional (in a few instances: 3-dimensional) tabular economic data, the Cenex SDC Handbook (Hundepool et al., 2006) recommends use of the method Modular (c.f. De Wolf, 2002). This method takes a backtracking approach, applying the Fischetti/Salazar Linear Optimization algorithm (Salazar Gonzalez, 2000) to select secondary suppressions within the individual sub-tables.
The current version of τ-ARGUS does not yet offer processing of general linked tables with Modular.
Within this project, we will study the problem of protecting linked tables in typical standard situations like tables having a column or layer in common (NACE x Region and NACE x Size Classes) or a set of tables with different degree of detail in certain dimensions (like NACE-3 x Region1 and NACE-1 x Region2). Another aspect is dissemination history. Tables are not always produced at the same time and in that situation the suppression pattern of table 1 is a starting point for the protection of table 2. Cells that remained unsuppressed in table 1 must remain unsuppressed in table 2 as well. We refer to this as non-iterative processing in the following. A typical instance has been studied in (Giessing, 2007).
In the first year, we will study some typical instances and will propose methodology to solve those linked table problems. A report will suggest both, backtracking strategies for linked tables with hierarchical structure, and several alternative approaches for handling infeasibility problems in non-iterative processing situations.
If possible, a first step of the implementation for certain typical situations could be envisaged in Year 2 given the proposed solutions. However, it might turn out that the proposed methodology might lead to computationally very hard problems which will require a larger project involving academic specialists in this field.
Partners: DE, CBS
Deliverables: Report after year 1 and a first implementation in τ-ARGUS at the end of year 2
References:
De Wolf, P.P. (2002), ‘HiTaS: A Heustic Approach to Cell Suppression in Hierarchical Tables’, In: ‘Inference Control in Statistical Databases’ Domingo-Ferrer (Ed.), Springer (Lecture notes in computer science; Vol. 2316)
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., de Wolf, P.P. (2006) CENEX SDC Handbook on Statistical Disclosure Control, version 1.01, available at  http://research.cbs.nl/casc/handbook.html
Salazar Gonzalez, J.J (2000), ‘Models and Algorithms for Optimizing Cell suppression Problem in Tabular Data with Linear Constraints’, In: ‘Journal of the American Statistical Association’, 95, 916-928
S. Giessing (2007). Balancing the amount of detail in 3-dimensional hierarchical tabular economic data, Proceedings of the ISI 2007, Lisbon